<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" 
  "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">

<html xmlns="http://www.w3.org/1999/xhtml">
  <head>
    <meta http-equiv="content-type" content="text/html; charset=iso-8859-1" />
    <title>Logback Throughput Benchmark</title>
    <link rel="stylesheet" type="text/css" href="css/site.css" />
    <link rel="stylesheet" type="text/css" href="css/_print.css" media="print" />

  </head>
  <body>
    <script type="text/javascript">prefix='';</script>

    <div id="container">

    <script src="templates/header.js" type="text/javascript"></script>
    <div id="left">
      <noscript>Please turn on Javascript to view this menu</noscript>
      <script src="templates/left.js" type="text/javascript"></script>
    </div>
    <div id="right">
      <script src="templates/right.js" type="text/javascript"></script>
    </div>

   
    <div id="content">
      
      <h2>Benchmarking synchronous and asynchronous logging</h2>

      <p>Log4j 1.2 as well as logback have supported asynchronous
      logging for many years by the way of
      <code>AsyncAppender</code>. This appender essentially collects
      newly created logging events, as produced by the application,
      into a circular buffer. The events in this circular buffer are
      then processed by a dedicated worker thread which writes the
      events to their destination, be it a file, a remote server or a
      database.
      </p>

      <p>Thus, from the application's point of view, the cost of
      logging is reduced to placing a logging event in a buffer local
      to the JVM. Such a handing-off operation can be performed with a
      throughput in the ballpark of roughly 1'000'000 operations per
      second or a service duration of a microsecond per operation. If
      the destination of the event is a database with a throughput of
      200 operations per second, the performance gain can be
      tremendous. This very favorable situation occurs if the rate of
      logging event arrival is <em>slower</em> than the rate of
      service, that is the rate at which events are transmitted to
      destination and taken off the circular buffer.</p>

      <p>Asynchronous logging is not a panacea however. It still has
      to obey they laws of the physical universe. In particular, in
      case the rate of arrival of logging events is consistently
      higher than the rate at which the events can be written to
      destination, the circular buffer within
      <code>AsyncAppender</code> will eventually become full, as
      infinite sized buffers do not exist in our universe, and the
      application will be able hand off events to
      <code>AsyncAppender</code>'s buffer only at the rate the events
      are actually written to destination. This is an essential
      limitation of asynchronous logging and all queued/buffered
      systems in general. By the way, computers can be viewed as a
      system of queues.
      </p>

      <p>Here is a more concrete example: suppose for a given hardware
      configuration <code>FileAppender</code> can write to disk at the
      rate of 100'000 events per second, that is with a duration of 10
      microseconds per event written, then, if your application logs
      101'000 events per second, then the throughput observed by your
      application will be 100'000 events written per
      second. AsyncAppender's buffer will not be able to compensate
      for arrival of the extra 1000 events per second. We say that the
      <code>AsyncAppender</code> is overloaded by the arrival rate of
      logging events.
      </p>

      <p>When the <code>AsyncAppender</code> is overloaded, your
      application will be paying the cost of moving events in and out
      of <code>AsyncAppender</code>'s circular buffer in addition to
      sending events to destination. Thus, when we talk about
      asynchronous logging performance, we need to consider the
      performance of logging when the <code>AsyncAppender</code> is
      overloaded. A second consideration is whether asynchronous
      logging buckles under the pressure of many producer threads all
      contending for AsyncAppender's circular buffer.
      </p>
      
      <p>As mentioned earlier, if the rate of event arrival is slower
      than the service rate, the application will only have to
      hand-off logging events to a circular buffer, significantly
      lowering the perceived cost of logging. At the same time, we
      must also consider the worst-case scenario where the rate of
      logging event arrival is higher than the service rate. Such a
      scenario is not that uncommon as slow appenders
      abound. Therefore, as a worst-case analysis, an asynchronous
      logging benchmark should yield information about the overhead of
      asynchronous logging (when overloaded) and its behavior in
      presence of many producer threads.
      </p>

      <p>We have chosen to use <code>FileAppender</code> in
      conjunction with <code>AsyncAppender</code>. Not only is
      <code>FileAppender</code> the most often used appender, it is
      fast enough so that differences in asynchronous logging
      overhead can be noticeable. Had we used <code>DBAppender</code>,
      all asynchronous implementations would have the same performance
      with little or no noticeable differences. </p>


      <h3>Comparing log4j, log4j2 and logback</h3>

      <p>The source code for the benchmarks can be found in the
      <a href="https://github.com/ceki/logback-perf">logback-perf</a>
      github repository. We compare the performance of log4j version 1.2.17,
      log4j 2.14.1 and logback 1.3.0-alpha10. </p>
      
      <p>We have taken care to compare apples to apples. All tests
      specify the same buffer size, i.e. 256KB, for all
      <code>FileAppender</code> instances across frameworks and using
      the same identical pattern, that is <span class="pattern">"%d %p
      [%t] %c - %m%n"</span>.  Lossy behavior (when the buffer is
      full) is turned off for logback's <code>AsyncAppender</code>. It
      is turned off by default for log4j 1.2 as well as log4j2.</p>

      <p>Regarding the circular buffer size for
      <code>AsyncAppender</code> or log4j's LMAX ring buffer, we have
      chosen to leave the default values as is. By default, log4j 1.2
      uses a circular buffer size of 128, log4j2 ring buffer is set to
      262'144 by default and logback is set to a buffer size of
      256. In principle, the much larger ring buffer should favor
      log4j2.
      </p>
      
      <p>Below are the benchmark figures when ran under Windows 10,
      JDK 16, Intel i7-6770HQ CPU and an WDC NVMe WDS512G1XOC hard
      disk with an advertised (and actual) 800MB/sec sequential write
      throughput.</p>

 <!--
     easier cut and paste with emacs marco uusing forward-whitespace with </td><td> insertions
-->      
      <table  class="bodyTable rightAlign striped">
        <tr>
          <th>Threads</th>
          <th>log4j synchronous</th>
          <th>log4j asynchronous</th>
          <th>log4j2 synchronous</th>
          <th>log4j2 asynchronous</th>
          <th>logback 1.3.0 synchronous</th>
          <th>logback 1.3.0 asynchronous</th>
          <th>Unit</th>
        </tr>

        <tr>
          <td>1</td>
          <td>987.08</td>
          <td>745.34</td>
          <td>884.33</td>
          <td>844.67</td>
          <td>2,139.83</td>
          <td>1,760.30</td>
          <td>ops/ms</td>
        </tr>

        <tr>
          <td>2</td>
          <td>542.27</td>
          <td>716.09</td>
          <td>1,220.76</td>
          <td>819.40</td>
          <td>2,276.77</td>
          <td>1,821.36</td>
          <td>ops/ms</td>
        </tr>
        <tr>
          <td>4</td>
          <td>639.86</td>
          <td>676.35</td>
          <td>1,406.60</td>
          <td>770.27</td>
          <td>1,836.99</td>
          <td>1,799.39</td>
          <td>ops/ms</td>          
        </tr>
        <tr>
          <td>8</td>
          <td>633.13</td>
          <td>726.21</td>
          <td>1,257.63</td>
          <td>733.25</td>
          <td>1,787.62</td>
          <td>1,774.99</td>
          <td>ops/ms</td>          
        </tr>
        <tr>
          <td>16</td>
          <td>585.13</td>
          <td>693.74</td>
          <td>1,211.31</td>
          <td>722.34</td>
          <td>1,813.09</td>
          <td>1,815.10</td>
          <td>ops/ms</td>          
        </tr>

        <tr>
          <td>32</td>
          <td>643.85</td>
          <td>657.08</td>
          <td>1,203.27</td>
          <td>704.08</td>
          <td>1,782.81</td>
          <td>1,751.21</td>
          <td>ops/ms</td>          
        </tr>

        <tr>
          <td>64</td>
          <td>576.67</td>
          <td>696.21</td>
          <td>1,236.37</td>
          <td>726.15</td>
          <td>1,740.27</td>
          <td>1,644.81</td>
          <td>ops/ms</td>          
        </tr>
      </table>      

      <p>Here is the table presented as a chart.</p>
      
      <p>
        <a href="images/perf/sync-async-perf.jpg">
          <img src="images/perf/sync-async-perf.jpg" alt="Click to enlarge" height="400"/>
        </a>
      </p>

      <p>The above results show that throughput in synchronous logging
      is actually higher than that of asynchronous logging. This is
      true across frameworks. This result can be attributed to the
      fact that in case of a fast appender, the extra work involved in
      accessing an overloaded ring buffer can be significant. In case
      of log4j2, this extra work is as high as 70%.
      </p>

      <p>Another interesting point is that throughput degrades as the
      number of producer threads increase. Fortunately, the
      degradation is usually within a few percentage points across
      frameworks.
      </p>

      <p class="highlight">logback version 1.3-alpha10 performs about
      3 times faster than log4j and about 1.6 times faster than log4j2
      in case of synchronous logging. For asynchronous logging,
      logback 1.3 performs 2.5 faster than log4j and 2.3 times faster
      than log4j2.</p>

      
      <p>While these points may be somewhat subtle, the not so subtle
      observation is that logback version 1.3-alpha10 performs about 3
      times faster than log4j and about 1.6 times faster than log4j2
      in case of synchronous logging. For asynchronous logging,
      logback 1.3 performs 2.5 faster than log4j and 2.3 times faster
      than log4j2.</p>

      <p>Note that with 2'200'000 events per second, and 209 bytes
      written per event, logback's <code>FileAppender</code> is
      generating output at 474 MB/sec, or at 59% of the hard drive's
      throughput capacity of 800 MB/sec. If nothing else, this is
      quite a testament for the performance of the JVM.</p>
      
      <p>Given that logback uses off the shelf JDK components in
      contrast to log4j2 which uses the LMAX Distruptor, that is
      highly optimized code pervasively using 64-byte cache alignment,
      a.k.a. mechanical sympathy, the above results can be quite the
      surprise.  </p>

<!--2021-08-23 13:02:56,371 DEBUG [ch.qos.logback.perf.AsyncWithFileAppenderBenchmark.logbackFile-jmh-worker-62] ch.qos.logback.perf.jmh_generated.AsyncWithFileAppenderBenchmark_jmhType - This is a debug message-->
      
      <p>How can code optimized for CPU cache alignment be
      significantly slower than code which is supposedly less
      optimized?</p>

      <p>While we do not have a definite answer, it is likely that
      algorithmic differences account for logback's better
      performance. Moreover, in a multi-core CPU running many threads,
      concurrent access requires use of the shared L3 cache, perhaps
      mitigating the advantages of L1 and L2 caches.      
      </p>

      <h3>Replication of results</h3>
      
      <p>The results have been replicated by the author on several
      other computers, including an Intel i7-8565U running Windows, an
      Intel Xeon E5-2650L running Linux under a KVM hypervisor and an
      AMD A10 Micro-6700T running Linux.</p>
      
      
      <script src="templates/footer.js" type="text/javascript"></script>	
    </div>
    </div>
  </body>
</html>
